Document Summarization Using Conditional Random Fields

نویسندگان

  • Dou Shen
  • Jian-Tao Sun
  • Hua Li
  • Qiang Yang
  • Zheng Chen
چکیده

Many methods, including supervised and unsupervised algorithms, have been developed for extractive document summarization. Most supervised methods consider the summarization task as a twoclass classification problem and classify each sentence individually without leveraging the relationship among sentences. The unsupervised methods use heuristic rules to select the most informative sentences into a summary directly, which are hard to generalize. In this paper, we present a Conditional Random Fields (CRF) based framework to keep the merits of the above two kinds of approaches while avoiding their disadvantages. What is more, the proposed framework can take the outcomes of previous methods as features and seamlessly integrate them. The key idea of our approach is to treat the summarization task as a sequence labeling problem. In this view, each document is a sequence of sentences and the summarization procedure labels the sentences by 1 and 0. The label of a sentence depends on the assignment of labels of others. We compared our proposed approach with eight existing methods on an open benchmark data set. The results show that our approach can improve the performance by more than 7.1% and 12.1% over the best supervised baseline and unsupervised baseline respectively in terms of two popular metrics F1 and ROUGE-2. Detailed analysis of the improvement is presented as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach to Automatic Summarization by Using Latent Dirichlet Allocation in Conditional Random Field

A New Approach to Automatic Summarization by Using Latent Dirichlet Allocation in Conditional Random Field Xiaofeng Wu, Chengqing Zong (National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing 100190, China) Abustract: In recent years, Latent Dirichlet Allocation(LDA) has been used more and more in Document Clustering, Classification, Segmentation, and some one has used it in ...

متن کامل

Automatic Identification of Rhetorical Roles using Conditional Random Fields for Legal Document Summarization

In this paper, we propose a machine learning approach to rhetorical role identification from legal documents. In our approach, we annotate roles in sample documents with the help of legal experts and take them as training data. Conditional random field model has been trained with the data to perform rhetorical role identification with reinforcement of rich feature sets. The understanding of str...

متن کامل

Automatic Annotation Techniques for Supervised and Semi-supervised Query-focused Summarization

In this paper, we study one semi-supervised and several supervised methods for extractive query-focused multi-document summarization. Traditional approaches to multidocument summarization are either unsupervised or supervised. The unsupervised approaches use heuristic rules to select the most important sentences, which are hard to generalize. On the other hand, huge amount of annotated data is ...

متن کامل

Multi-Document Summarization using Automatic Key-Phrase Extraction

The development of a multi-document summarizer using automatic key-phrase extraction has been described. This summarizer has two main parts; first part is automatic extraction of Key-phrases from the documents and second part is automatic generation of a multidocument summary based on the extracted key-phrases. The CRF based Automatic Keyphrase extraction system has been used here. A document g...

متن کامل

Isotonic Conditional Random Fields and Local Sentiment Flow

We examine the problem of predicting local sentiment flow in documents, and its application to several areas of text analysis. Formally, the problem is stated as predicting an ordinal sequence based on a sequence of word sets. In the spirit of isotonic regression, we develop a variant of conditional random fields that is wellsuited to handle this problem. Using the Möbius transform, we express ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007